K‐means cluster analysis of characteristic patterns of allergen in different ages: Real life study

Abstract Background Atopy varies in people of different ages owing to different physical conditions and exposure to allergens. We aimed to cluster ages based on atopic severity using K‐means cluster analysis and identify atopic incidence, severity, as well as the association among peripheral eosinophils, IgE and sensitisation. Methods Consecutive patients (n = 7654) with allergic symptoms and undergoing allergen‐specific IgE tests were included from 2013 to 2017. Age, sex, specific‐IgE, peripheral eosinophil counts and total‐IgE were collected. Results Five age categories were identified: 1–17, 18–36, 37–52, 53–69 and 70–100 years. The incidences of atopy and poly‐sensitisation decreased with increasing age. Similar trend was observed for aeroallergens, egg and milk but not for peanuts, soy or seafood. Dust mites remain the crucial factor bothering patients with allergic symptoms, especially for children and adolescents. In patients aged <52 years, sensitisation to aeroallergens was more prevalent than food. In group 37–52 years, incidence of females' atopy was higher than that of males. The overlap of atopy, high eosinophils, and high total‐IgE was found in only 19.18% of patients. The trend of allergen‐test positivity is not parallel to total IgE and peripheral eosinophil counts. Conclusion Age‐grouping based on cluster analysis helps to find the changes in atopic status and distribution of sensitised allergens with age. Allergen tests are still necessary in the clinical diagnosis and treatment. An innovative exploration of the influence of age and allergens on total‐IgE and eosinophil counts is helpful for the development of bio‐targeted precision therapy. Clinical Trial Registration ChiCTR2300067700.


| INTRODUCTION
In the last two decades, the incidence of allergic diseases, such as asthma, rhinitis and dermatitis, has increased rapidly worldwide, including in China. Data from a random young population showed that approximately 50% of atopic patients, defined as a personal and/or familial tendency to produce IgE antibodies in response to ordinary exposure to allergens, have symptoms referable to atopy, including asthma, rhinitis and dermatitis. 1 Atopy is associated with the development, 2 severity and clinical control of allergic diseases. [3][4][5][6] The incidence of atopy in random populations, defined as the presence of positive(s) on prick skin testing with a small battery of common relevant allergens, ranges from 30% to 50%. 1,7,8 Age is responsible for the different incidences of sensitisation and clinical reactivity to allergens or its absence in sensitised people. 9 However, in previous atopy-related studies, age grouping was often subjective.
Other sequential age categories were also used [10][11][12] ; however, the reasons and principles of age division were not clarified. In adults aged 19-60 years, further subcategory recognition of age has not been adequately studied, although the heterogeneity of atopic status between ages has been reported. 10 Inappropriate age classification understates or overinterprets its true impact on atopy and allergen distribution. Therefore, studies on recognition of age category clustering and further analysis to investigate the characteristics of allergen sensitisation in the identified sequential age categories are necessary.
In primary hospitals, total IgE (t-IgE) and blood eosinophils are commonly used as biomarkers for atopy or allergy. However, their predictive value for atopy requires further exploration. Therefore, a single-centre retrospective study was performed in consecutive patients who visited the Shanghai General Hospital and their serum allergen specific IgE (s-IgE) was assessed from January 2013 to December 2017. The aims of this study were to (1) cluster ages based on atopic severity with the K-means cluster analysis and (2) identify the atopic severities based on the age categories, t-IgE and peripheral eosinophils in those patients to investigate a possible relationship between these values and atopy.

| Study patients
A single-centre, retrospective study was performed on consecutive patients who visited the Shanghai General Hospital for allergic symptoms and underwent serum allergen S-IgE tests from January 2013 to December 2017. Patient charts were retrospectively reviewed.

| Study design
The age, sex, date of hospital visit and serum allergen s-IgE test results were collected and analysed. T-IgE and peripheral eosinophil values were also collected, if available.

The ethics committee of Shanghai General Hospital, Shanghai
Jiao Tong University School of Medicine, approved the protocol and a waiver of informed consent was issued for our study (number: 2017KY159).  T-IgEs were measured using UniCAP 1000 and defined as elevated if they exceeded 60 IU/mL. Eosinophils were measured using an automatic complete blood count with differential blood test and expressed as absolute values and percentages of cell counts. The absolute eosinophil count was considered high when it exceeded 0.15 � 10 9 cells/L.

| K-means clustering analysis
The K-means clustering algorithm, a type of unsupervised machine learning used to identify homogeneous subgroups from unlabelled input data, 13 is used for grouping age into a number of k clusters for allergen sensitisation and minimises the variance of the difference between each cluster and distance in this study.
The analysis was conducted as follows: Step 1: k (number of clusters, starting from 5) data objects were randomly extracted and the centroids of each cluster were computed. Step 2: All data are

| Analysis
Baseline data are presented descriptively. Normality of distribution was assessed using the Shapiro-Wilk test. Normally distributed data are expressed as mean � standard deviation. Non-normally distributed data are expressed as median and IQR. Heatmaps were created using GraphPad Prism (version 9.0; GraphPad Software). Multiple comparisons were made using Dunn's test. Differences between aeroallergens and food allergens were compared using the Mann-Whitney test. The threshold for statistical significance for all analyses was set at p < 0.05.  Visual exploration of the heatmap data ( Figure S2) identified several age groups with a relatively high prevalence of atopy.

| Baseline characteristics
Therefore, the K-means cluster analysis was performed to identify homogeneous subgroups based on age for allergen sensitisation.

| K-means cluster analysis
We performed the K-means cluster analysis with five, six, seven and

| Distribution of sensitised allergens by five age categories
The incidence of atopy decreased with increasing age: 71.41% for the age group of 1-17 years, 59.79% for 18-36 years, 47.60% for 37-52 years, 44.50% for 53-69 years and 37.62% for 70-100 years (Table S1). A similar trend of decreasing incidence with age was observed for the most common aeroallergens: dust mites, house dust, cats, B. germanica and mould combination; sensitisation to them had the highest incidence and was the most severe in the age group of 1-17 years (Figures 2A and 3A). The incidence of sensitisation to dust mites, which was the most common allergen, was 32% in the age group of 1-17 years, with an average SI of 1.00.
However, the incidence of food allergens exhibited different traits. Sensitisation to egg or milk was more prevalent in the age group of 1-17 years, with the highest SI as well (Figures 2A and 3B).
Sensitisation to peanuts and soy maintained a stable incidence and severity among the five age categories. Among all of them, only sensitisation to seafood (crab, shrimp and fish combination) had a higher incidence and severity in the age group of 70-100 years compared with the incidence in the age groups of 37-52 and 53-69 years ( Figure 3B). In patients aged <52 years, SI of aeroallergens was higher than that of food allergens ( Figure 4C). Aeroallergens and food allergens had a similar incidence of sensitisation in patients aged 53-100 years. Only in 37-52 years, the incidence of females was higher than males but males' SI was higher than females in all age groups ( Figure 2C,D).
The proportion of patients who were allergic to only one allergen was similar across all age categories ( Figure 4D). The incidence of poly-sensitisation decreased with increasing age. Overall, 45% of adolescents and teenagers were allergic to more than two allergens. Thus, the proportion of non-atopic patients increased with age.

| The characteristics of t-IgE and eosinophil in atopic patients
Peripheral and t-IgE and eosinophil counts varied in patients allergic to different allergens (

| DISCUSSION
This study used the K-means cluster analysis to identify the more natural age categories retrospectively in a consecutive cohort of 7654 patients who underwent allergen testing over a 5-year period.
As we know, this is the first research focusing on the relationship among atopy, t-IgE and eosinophils in such a large population. We found that SI and proportion of poly-sensitisation were decreased with age increasing, unrelated to allergen types; SI of aeroallergens was higher than food allergens in every age group; incidence of female's atopy was higher than male's atopy in 37-52 years, but SI of female sensitisation was lower than male in all age groups, based on our age categories. Besides that, the trend of allergen-test positivity is not parallel to t-IgE and peripheral eosinophil counts. We believe that our findings will be helpful in the performance of allergic disease research for different age groups.
Immunoblotting test of serum s-IgE is used to evaluate the sensitisation of various allergens and has high specificity as it avoids the influence of drug combination and operator experience. 14 High serum s-IgE predicts more airway events than the skin prick test. 15,16 Furthermore, the pain and inconvenience of acupuncture have a negative impact on patients' willingness to undergo tests and limit the number of allergen SPTs performed. Immunoblotting test for serum s-IgE lacks this limitation and could provide complete information about sensitisation to more allergens in this study, as both aeroallergens and food allergens were assessed and collected for each patient.
Results of our K-means clustering analysis revealed that all aeroallergens, egg and milk showed a descending trend with increasing age, regardless of the incidence or severity of sensitisation, as well as the incidence of allergic poly-sensitisation. 17 Therefore, more attention should be paid to sensitisation in children and adolescents. Gender differences in atopy were observed in some age groups. Males' atopy incidence was higher than female in all age groups except in 37-52 years, but as for SI of sensitisation, males may have a higher SI than females in all age groups. Sex hormones may explain sex differences in allergic diseases, which increase the risk of sensitisation in 37-52 years' females. [18][19][20] Previous studies only indicated that allergic diseases are more common in males than females before puberty, with a reversal after puberty. 21 Distinct allergens caused different degrees of increasing IgE and eosinophils; cats in our study had the highest level of t-IgE. A previous study reported that allergens from mammals had more association with high s-IgE. 22 The higher level of eosinophils seemed to be distributed in aeroallergens more often. It proved that aeroallergens were capable of driving the increase in SI, IgE and eosinophils more often. However, allergen-test positivity is not highly parallel to t-IgE and peripheral eosinophil counts. 10,23,24 Therefore, the effectiveness of t-IgE or blood eosinophil count alone in determining atopy is limited and a combination of the two is more valuable in ruling out or  However, the clinical benefits of mite avoidance remain unclear. A meta-analysis of 44 trials on mite avoidance concluded that mite control measures are not recommended for asthma. 31 But their conclusions were challenged because they did not distinguish between adult and paediatric studies. 32 The tendency of dust mite sensitisation to decrease with increasing age noted in our study suggests possible benefits of mite avoidance in children. For children and adolescents, early allergen-control intervention may be effective in reducing the number of sensitised children with asthma attending the hospital with asthma exacerbations in adulthood. [33][34][35] As for food allergens, the two most common food allergens are milk and eggs. Sensitisation to eggs and milk is more prevalent in children than in adolescents and adults, 36-38 consistent with our findings. In contrast, seafood allergy affects a substantial proportion of adults, many of whom develop the disease during adulthood. 39 The mechanism remains unknown, perhaps because of the degeneration of the immune system and gastrointestinal tract function, 40 which leads to a slight increase in the severity in patients aged 70-100 years. The incidence of peanut allergy was maintained at a stable low severity among all groups, indicating that peanut allergies may persist throughout life. 41,42 Our study has some limitations. First, our data were obtained from a single centre, although some of our patients came from adjacent provinces such as Jiangsu, Anhui and Zhejiang, which can reflect the condition of east China. Therefore, large-scale epidemiological studies need to be performed. In addition, the statistical methods used in the analyses assume a cause-effect relationship between the risk factors and atopy. Future prospective intervention studies are necessary to explore the mechanisms of allergy in specific age groups.

| CONCLUSION
Distinct demographic features such as age are associated with atopy and allergic diseases. In our large-scale cohort study, the sensitive indexes of most allergens decreased with increasing age. Increased (B) food allergens. *, # and $ indicate P < 0.05; ** and ## indicate P < 0.01; ***, ### and $$$ indicates P < 0.001. *compared with the age group of 1-17 years; # compared with the age group of 18-36 years; $ compared with the age group of 37-52 years.